Skip to content

Conversation

@jerry-024
Copy link
Contributor

@jerry-024 jerry-024 commented Oct 31, 2025

Purpose

format table: support file compression and custom default for diff format

  • parque: snappy
  • orc: zstd
  • csv and json: none

Tests

API and Format

Documentation

@jerry-024 jerry-024 marked this pull request as draft October 31, 2025 09:28
@jerry-024 jerry-024 requested a review from Copilot October 31, 2025 09:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces automatic file compression configuration for format tables based on their file format type. Instead of requiring users to explicitly set 'file.compression'='none' in table properties, the system now automatically determines appropriate compression defaults: none for CSV/JSON, snappy for Parquet, and zstd for ORC.

Key changes:

  • Added FORMAT_TABLE_FILE_COMPRESSION config option with fallback to FILE_COMPRESSION
  • Implemented getFormatTableFileCompression() utility method to determine compression based on format
  • Updated catalog implementations (HiveCatalog, RESTCatalog) to auto-populate file compression during table creation
  • Removed explicit 'file.compression'='none' declarations from test cases

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
paimon-api/src/main/java/org/apache/paimon/CoreOptions.java Added FORMAT_TABLE_FILE_COMPRESSION config option and formatTableFileImplementation() method
paimon-core/src/main/java/org/apache/paimon/catalog/CatalogUtils.java Implemented getFormatTableFileCompression() with format-specific defaults
paimon-hive/paimon-hive-catalog/src/main/java/org/apache/paimon/hive/HiveCatalog.java Auto-set file compression during format table creation
paimon-core/src/main/java/org/apache/paimon/rest/RESTCatalog.java Auto-set file compression during format table creation
paimon-core/src/main/java/org/apache/paimon/table/format/FormatTableFileWriter.java Updated to use formatTableFileImplementation() instead of fileCompression()
paimon-core/src/test/java/org/apache/paimon/catalog/CatalogTestBase.java Added test coverage for format-specific file compression defaults
paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/table/PaimonFormatTableTest.scala Removed explicit file compression settings from test cases
paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/sql/FormatTableTestBase.scala Removed explicit file compression settings from test cases
paimon-spark/paimon-spark-ut/src/test/java/org/apache/paimon/spark/SparkCatalogWithHiveTest.java Removed explicit file compression settings from test cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jerry-024 jerry-024 force-pushed the format_table_compression branch from 7fa9a60 to 94207bd Compare November 6, 2025 05:14
@jerry-024 jerry-024 changed the title [core] format table: support file compression and custom default for … [core] format table: support file compression and custom default for diff format Nov 6, 2025
@jerry-024 jerry-024 force-pushed the format_table_compression branch from 37cef85 to e05a60d Compare November 6, 2025 07:39
@jerry-024 jerry-024 force-pushed the format_table_compression branch from e05a60d to f781e02 Compare November 6, 2025 08:00
@jerry-024 jerry-024 marked this pull request as ready for review November 6, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant